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Abstract 

Eigen's quasi-species model describes viruses as ensembles of different mutants of a high fitness 
"master" genotype. Mutants are assumed to have lower fitness than the master type, yet they 
coexist with it forming the quasi-species. When the mutation rate is sufficiently high, the master 
type no longer survives and gets replaced by a wide range of mutant types, thus destroying the 
quasi-species. It is the so-called "error catastrophe". But natural selection acts on phenotypes, 
not genotypes, and huge amounts of genotypes yield the same phenotype. An important conse- 
quence of this is the appearance of beneficial mutations which increase the fitness of mutants. 
A model has been recently proposed to describe quasi-species in the presence of beneficial mu- 
tations. This model lacks the error catastrophe of Eigen's model and predicts a steady state in 
which the viral population grows exponentially. Extinction can only occur if the infectivity of the 
quasi-species is so low that this exponential is negative. In this work I investigate the transient 
of this model when infection is started from a small amount of low fitness virions. I prove that, 
beyond an initial regime where viral population decreases (and can go extinct), the growth of the 
population is super-exponential. Hence this population quickly becomes so huge that selection 
due to lack of host cells to be infected begins to act before the steady state is reached. This result 
suggests that viral infection may widespread before the virus has developed its optimal form. 
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1. Introduction 

It seems that an unavoidable consequence of the increase in complexity of a system is the 
appearance of parasites. These are entities able to exploit backdoors, bypasses, holes. . . of the 
system for their own benefit, sometimes even at a cost for the system. We see a huge variety 
of these parasites in biology, ranging from viruses to humans. Society, in fact, is one of those 
complex systems amenable to exploitation by free-riders (the paradigm of the Public Goods 
game fljj] is but one prominent acknowledgment of the existence of this social parasitism). More 
recently, the widespread use of computers and the arrival of Internet has made us witness the 
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emergence and proliferation of computer viruses, trojans, worms, spam, phising, and all kinds 
of forms of parasitism, which flood the web using the same mechanisms aimed at allowing the 
transmission of information. Apparently, whenever a complex mechanism emerges, it is soon 
invaded by its specific parasites. 

Parasites need not be complex: on the contrary, by being very specific to a particular mecha- 
nism, they are able to do their job with very simple mechanisms. Paradigmatic among parasites 
for their extreme simplicity are viruses. Their success is such that they are the most abundant life 
forms on Earth |2|]. Their existence is an unavoidable outcome of the very evolutionary process. 
In fact, the most common strategy of RNA viruses is to have a very high reproductive rate which 
yields a wide variety of mutants yfl. This ensures their fast adaptation to almost any change. 

One of the most important challenges in current medical research is how to fight viruses, 
and one of the most studied strategies is the design of therapies able to induce viral extinction. 
Increasing the mutation rate has been successful, at least in experiments in vitro, but there is no 
consensus as to why the virus loses infectivity at high mutation rates [5LH0I- The pioneering 
work of Eigen |@1 explains viral extinction through a mechanism known as error threshold. Ac- 
cording to it, the progeny loses its identity if the mutation rate grows above a given value, which 
is inversely proportional to the length of the replicating molecule — hence putting an upper bound 
to the complexity of viruses. This classical theory is currently questioned. The current state of 
the art of the evolutionary paradigm contradicts some of the basic assumptions of Eigen's theory, 
crucial for the existence of the error threshold. Alternative mechanisms may lead to viral extinc- 
tion for reasons other than this hypothetical error threshold (like the presence of defective forms 
of the virus the competition induced by geometrical constraints etc.). 

Models of viral evolution need to make simplifying assumptions, and real virus behavior of- 
ten deviates substantially from their predictions [ 16]. Current quasi-species models assume high 
mutation rates that give rise to heterogeneous populations. This is consistent with experimental 
observations. However, one common approximation is to consider that all new mutations have a 
deleterious effect on fitness, thus neglecting beneficial and neutral mutations. This is true if, as 
the theory assumes, there is a unique master sequence of high fitness. But we now know that the 
genotype-phenotype map is extremely redundant, and that a huge amount of different sequences 
— forming so-called neutral networks [ 17] — yield phenotypes that perform equally well. The 
increase in the rate of beneficial and neutral mutations that this effect brings about invalidates the 
classical theory of the error threshold |7] and calls for alternative models of viral evolution and 
extinction. 

The aim of this paper is to explore one such model, introduced by Manrubia et al. [ 18], with 
special focus on its transient behavior. 



2. Quasi-species equation 

Evolution is a result of the simultaneous action of three processes: replication, mutation 
and selection. Any set of agents undergoing these three processes evolve — in the direction 
determined by selection — regardless of whether they are biological entities, computer programs, 
cultural traits, etc. 

Replication is the ability of some agents to produce identical copies of themselves. Repli- 
cation is normally a stochastic process, characterized by a probability distribution p{k), k = 
0, 1,2, . . ., representing the probability that after a replication event — however we define it — 
there are k replicas of the parent agent (including itself). 
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The replication process is usually imperfect. Most often errors in making copies yield invalid 
individuals (unable to produce further copies); however sometimes these errors produce valid 
individuals albeit of a different type (or species). These kind of altered replications are referred 
to as mutations. Mutations create new species and maintain variability within populations. New 
species may have modified replicative abilities, and so a probability distribution /5,(k) must be 
introduced for each species i, where k = (fci,&2, . . . ,k s ) is a vector denoting the number of 
offspring of any of the s possible resulting species that an individual of species i gives rise to. 

The replication with mutation of an individual of any of the valid species generates a Markov 
process in discrete time known as multi-type branching process fl9ll . The variable characterizing 
this process is the population of each species at generation f, Tit) = {Z\(t),Z2{t), . . . , Z,(?))- The 
mean value of this variable n(f) = E[Z(t)] has the simple evolution equation 

n(f+l) = n(f)W, (1) 

where W = (w^) is the replication-mutation matrix. The number r,- = 2 ; wy denotes the average 
number of offspring that an individual of species i produces in a replication event, and = Wijjri 
is the probability that one of this offspring mutates to species j. So introducing stochastic matrix 
Q = (Sij) (mutation matrix) and the diagonal matrix R = (r;£y) (replication matrix), we can 
factorize W = RQ, thus separating the effect of replication and mutation in the evolution of n(f). 

The asymptotic behavior of this equation is given by n(f) = A'u, where A is the largest 
eigenvalue of W and u a (positive) eigenvector of its corresponding eigenspace. Population 
grows exponentially if A > 1, or vanishes exponentially if A < lQ 

We have not considered selection yet. Selection is induced by the environment, usually 
through a finite availability of resources for replication. Selection thus acts on the specific 
replicative ability of each species — modifying the values of r,. When scarcity of resources 
affects species equally, all values of r, are affected equally. In that case, what determines the 
fate of each species is its asymptotic fraction within the population. At generation t the fractions 
of population of each species is given by the vector x(f) = n(f)/n(f) ■ 1, where 1 = (1, . . . , 1). 
Equation (Q]) then becomes 

\{t + 1) = 4>(ty l x(t)W, <p(t) = x(f)Wl T = Yj r iXi(t), (2) 

i 

where we have used the factorization W = RQ and the fact that Q is stochastic (hence Q1 T = 1 T ). 
Function (pit) represents the mean replicative ability of the population at generation t. Equa- 
tion (f2]l is referred to as the quasispecies equation. 

The steady state of equation (f2]i is obtained by solving the eigenvalue problem xW = <f>x, 
under the constraint ->c, = 1, x, > 0, i = l,,..,s. If Q is an irreducible matrix and x are 
respectively the largest eigenvalue and its corresponding (unique) normalized left eigenvector of 
matrix 

3. Error catastrophe 

Eigen proposed the quasi-species equation as a model for the evolution of prebiotic replica- 
tors 18[] which, in the absence of correction mechanisms, had a high mutation rate and accordingly 



If A = 1 the process is "critical", and it can be proven to go extinct in finite time with probability one fl^l . 
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a short length. However it has become a paradigm of viral evolution even for much longer se- 
quences (RNA, DNA, proteins. . . ) J5, 21]. To envisage Eigen's idea we can think of a space of 



L long sequences, labelled i = 0, 1 , . . . , s. Each position of these sequences can be occupied by 
any element of a given set of them (DNA or RNA bases, alleles of genes, aminoacids. . . ). Let 
us assume that this set contains a elements (a — 4 for bases, a — 20 for aminoacids. . . ). Muta- 
tions are point-like, i.e., substitutions of the element at a single position by any other in the set. 
Thus sequences ACGGCA and AGGGCA are reached from each other by a mutation, whereas 
ACGGCA and AGGGCC are two mutations apart. Any offspring of the replicated sequence will 
carry a point mutation with probability < /i <§c 1 . The sequence labeled as (master sequence) 
is assumed to have a higher replicative ability (henceforth fitness) than any other sequence. For 
simplicity, all sequences are assigned fitness 1 whereas the master sequence has fitness / > 1. 
We shall denote the fraction of population of the master sequence by x. 

An important assumption in Eigen's model is that backward mutations that recover the master 
sequence are neglected. This is a reasonable assumption considering that sequences in nature 
tend to be very long. The master sequence is recovered with probability (/j./D) h , where h is the 
Hamming distance (number of different positions) between the given sequence and the master 
sequence, and D — (a - l)L is the number of point mutants of an L long sequence. 

Under the above assumptions the quasi-species equation (0 reads 

4/(1 — fiD) + e] = x<f>, cf> = 1 + (r - l)x, (3) 
where e contains those backwards mutation that the theory neglects. This equation predicts 

jl - j^fiD if [iD < 1 - i, 



if nD>\ 



f (4) 



in other words, if the mutation rate is above a threshold (which decreases as L" 1 ), the master 
sequence accumulates so many mutations that it gets lost in a cloud of mutants. This transition is 
known as the error catastrophe and has provided a line of research to find a therapy against viral 
infections based on increasing fi through the addition of mutagens 12211 . 



4. Phenotype vs. genotype 

But Eigen's model is fundamentally flaw in the assuming the existence of a single master 
sequence or genotype. Biology is extremely redundant. DNA codes for proteins using a (nearly) 
universal genetic code based on triplets of bases or codons. Each codon codes for an aminoacid. 
But the 64 possible codons only code for 20 aminoacids plus a STOP signal. In redundant 
aminoacids, typically the third base is irrelevant or nearly so. This means that many mutations 
changing a base pair in the DNA sequence remain silent when transcribed into proteins. On 
their side, proteins fold in a three-dimensional structure which determines their function. And 
only a few aminoacids at selected positions are key to this folding. So the replacement of many 
of them leaves the protein structure (hence its function) intact. Evolution can only act on the 
macroscopic features of living beings (their phenotype), which are blind to a huge amount of 
mutations. In other words, the mapping from genotypes into phenotypes is from very many to 
one. The existence of a master sequence is therefore an entelechy. At most we can only speak of 
a master phenotype. 
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The distribution of genotypes corresponding to a given phenotype on genotype space is a 
rather complicated one. Basically they form so-called neutral networks B17I1 . i.e., connected 
components of the mutation graph through which sequences can be changed by successive muta- 
tions without ever changing the phenotype — hence their fitness. The most relevant consequence 
of the existence of neutral networks is that backwards (or beneficial) mutations are not negligi- 
ble, because recovering the master phenotype (not genotype) is no more an improbable event. 
Changing Eigen's model to account for beneficial mutations eliminates the error catastrophe, as 
we will see in what follows. 



5. A model with beneficial mutations 



A simple model accounting for the existence of neutral networks has been recently pro- 
posed [18]. In this model, viral phenotypes are characterized by their replicative abilities, 
r e {0, 1, . . . ,R}. The only mutations that the model takes into account are those connecting 
neighboring classes (i.e., the effect of a mutation is a slight increase or decrease in the replicative 
ability). An offspring undergoes a deleterious mutation from from class r to class r — 1 with 
probability p, and a beneficial mutation from class r to class r + 1 with probability q. In general 
it is assumed that 0<g<K/?<Kl.Ifwe denote n r (f) the mean number of viral particles in class 
r at generation f, then 



n r (t + 1) = (1 - p - q)rn r (t) + p(r + l)n r+l (t) + q(r - l)n r -i(0, 
n R (t + !) = (!- p)Rn R (t) + q(R - l)n«-i(f)- 



1,2,.. .2?- 1, 



(5) 



Here R stands for the maximum replicative ability of the virus. There exists also class r — 0, 
with no replicative ability, whose population is maintained because of deleterious mutations from 
class r = 1. Hence «o(0 = pn\(f). 

Equations (0 have the form of (Q} for W - RQ with 



(l-p-q q 

p i-p-q i 



Q = 



l-p-q q 
P 1 ~P 



R = 



1 



R) 



(6) 



Notice however that Q is only sub-stochastic if we do not include class r — 0. This fact may 
cause the total extinction of the virus. Still, the eigenvalue equation (pu = uW determines the 
asymptotic behavior of the system n(f) ~ <p'u. Both <p and u are unique because W is irreducible. 
Vector u normalized as u ■ 1 = 1 describes the asymptotic fractions of viral particles in each class 
— even in the case that the virus eventually goes extinct. 

For q = it is easy to check that A r = r(l - p) and \ r = (v, i, . . . , v rR ), with v,± = (j,)(l - 
p) k p r ~ k , r,k = 1 ,...,/?, are the eigenvalues and left eigenvectors of matrix W, respectively. Since 
for every p the largest eigenvalue is (p - R(l - p), we find that p c — 1 - R l defines a transition 
value such that the virus proliferates for p < p c but gets extinct for p > p c . This transition is 
similar to Eigen's error catastrophe, except for the fact that the virus becomes extinct in this case 
because the lowest fitness class is r — 0, unable to infect further cells. 
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6. Transient and the infinite classes model 



As for the case q — 0, for q > we expect that the largest eigenvalue <p = O(R), so a model 
with an infinite number of classes will never reach the asymptotic state. However such a model 
can be useful to study the initial stages of the transient behavior provided R » 1 and initially the 
population has a low fitness ro <k R. The reason is that classes above ro get populated one by 
one, so at least for < t < R - ro there is no difference between the model with R < oo and with 
R = oo. This is the regime I plan to analyze here. 

So consider that the first of equations (0 holds for all r e N and assume that n r (0) = for all 
r > ro. Then the generating function 

oo 

G(z,t) = Y,z'n r (t) (7) 

r=l 

will be a polynomial of degree at most ro + t. Multiplying (0 by z' and adding up for all r > 1 
we obtain 

G(z, t + 1) = [p + (1 - p - q)z + qz 2 ] G z (z, t) - pm(t). (8) 

(Subindexes in functions are meant to denote partial derivatives.) 
Let us now introduce the generating functions 

OO j OO j oo 

N r (s) = J] S -n r (t), F(z, s) = J] 7 G{z, t) = £ z r N r (s). (9) 

1=0 ' t=0 ' r=l 

In terms of them equation (0 becomes 



F s (z, s) = pll + — 1 + — F z (z, s) - pNAs). (10) 

\ W- ) \ W+J 

where 

\-p-q±a ^ Q= Jl-2(p + q) + (p-q) 2 . (11) 
2q > 

The condition for Q to be real and positive is -\fp + ^Jq < 1 . This condition holds whenever 
< q < p < 1/4. As p — 1/4 is an extremely high mutation rate, we shall take for granted that 
Q. e M + . 

The first order partial differential equation ( ITOb needs to be supplemented with an initial 
condition for F(z, s). Indeed, if {n r (0)} r >\ is the initial condition of the viral populations, then 

oo 

F(z, 0) = G(z, 0) = g(z) = J] zV(0). (12) 

r=l 

The characteristic curves of equation ( TT0T > are given by 



-l 



! + _ ! + _ e "* = (r , (13) 
w- \ w+ / 



with f an arbitrary constant. We can eliminate z from this equation to get 



(14) 



In terms of the variables (£, s) and denoting /(£, s) = F(z(0 s), s), equation (ITOb becomes 



■p- l f,(£,s)=N 1 (s), f((,0) = g(z({,0)) = g[ PiC ^- 1, (15) 



whose solution is 

^(w + - w-0 



q(w+ - w-0 



f(£,s)=g\-p£—9—\-p I iVi(M)du. (16) 



Substituting ([T3J into ([T6| yields 



z + w + x(z,s) w- +z 

= ${z, s), XV., s) = (E - 1). (17) 



q(w+-W-0 l-x(z,s) 

It only remains to determine N\(s) = F : {0, s). This can be achieved by imposing F(0, s) = in 
([Tol l, which leads to 

p f N 1 (u)du = g(il,(0,s)). 
Jo 

Thus the final expression of the generating function F(z, s) is 

F(z,s)=g(fKz,s))-g(fKO,s)). (18) 

7. Asymptotic behavior of the transient 

Setting z — 1 in (0 we get F(l,s) = 2" n(f)' s '/f! 1 the generating function of the total 
population of the virus n(f) = Y*7=\ n r(t)- From ( fT8b . F(l, s) = g(iff(l, s)) - g(i//{0, s)), where 

^s)= ^ W+ f'f , x(hs)=^-(e^-l), (19) 

1 - X\L > •$) W + - Vf_ v y 

^) = f^l, x( o, s) =^—( e ^-l). (20) 

1 - ^(0, s) w + - W- v ' 

Let us assume for simplicity that g(z) - z r , i.e., at time t — only a single viral particle 
of class r is present. We can infer the asymptotic behavior of n(t) from the singularities of 
F(l, s) [23]. There are two sets of singularities: so + Unno/Cl and s\ + i2nni/Cl, with «o, «i e Z, 
which are the solutions to x(0, s) — 1 and x(\, s) = 1, respectively. In each set, the singularity 
on the real axis is the one with smallest modulus, so we shall ignore the remaining ones. Denote 
E = e nso and E\ = e" 1 ' ; then 

W+-W- w+ w + -w_ 

E = 1 + = — Ei = 1 + = . (21) 

w_ w_ 1 + vv_ 1 + w_ 

But 

£p_ _ w + (l + w_) _ w + + p/q > 
Ei w_(l + w+) W- + p/q ~~ 

because w + > w_ for all p, q > (the inequality is strict if at least one of them is nonzero). Then 
5(> > *i, so s\ is the singularity that is closest to the origin. From (l2"TT i 

Qil = l0g (— ) = l0g 1 Aq J ' (22) 



As lim s ^ Sl 4-[l ~xih *)] = E\ - -<l( w + + 1) + 0> then s\ is a simple pole of s). 

As its residue is -1/q, then 

s) as s — > si, (23) 



and therefore 



si = £2 lo 



(1 + q - p + £2) 



2\ 



4? / 
(24) 

From this we obtain the asymptotic behavior when t — > oo of the total population n(f) as 
1 (t + r- 1)! _,_ r /f + r- iy +r - 1/2 _ l /2^7 



q> (r- 1)! \ e*] / (r- \)\q r V «1 

8. Discussion 

We have analyzed the transient behavior of Manrubia et al.'s model © for a very large num- 
ber of classes (R » 1), by transforming it into an infinitely many class model. Although an 
explicit solution cannot be found, I have obtained the generating function associated to the vec- 
tor of class populations. The singularities of this function provide the time asymptotic behavior 
of the total population of the virus, valid as long as the number of generations is smaller than R. 
Surprisingly we find that viral population grows super-exponentially, unlike in the steady state. 

Analyzing eq. d25l l more closely, we notice that s\ can have very large values and thus induce 
an initial decay of the population. However, this decay gets dominated by the factorial (t + r — 1) ! 
as soon as t > = es\ — r + 1. During this decay time td (which is shorter the larger r) 
fluctuations of the branching process can lead the virus to extinction. Beyond that interval the 
virus population starts to recover and grows at a faster than exponential rhythm. 

A standard assumption in studies of viral quasi-species evolution is that their population is 
in the exponential asymptotic state. But if R » 1 the time to reach this state can be very long 
(in fact, it requires at least R - r generations to reach the optimal class, let alone to attain a 
stationary distribution among classes). Before that we have the virus population growing faster 
than exponential and it is plausible that resources get exhausted during this transient period. This 
means that selection starts playing a role when the steady distribution is not yet established, 
leading to a behavior different from what is to be expected in the asymptotic regime. The effects 
of this phenomenon are as yet unexplored. 
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